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Status of this Memo 


This memo provides information for the Internet community. It does 
not specify an Internet standard. Distribution of this memo is 
unlimited. 

Abstract 


This document illustrates the growth of the Internet by examination 
of entries in the Domain Name System (DNS) and pre-DNS host tables. 
DNS entries are collected by a program called ZONE, which searches 
the Internet and retrieves data from all known domains. Pre-DNS host 
table data were retrieved from system archive tapes. Various 
statistics are presented on the number of hosts and domains. 
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Introduction 


This document provides statistics on the growth of the Internet by 
examining the number of Internet hosts and domains over a 10-year 
period. Before the Domain Name System was established, practically 
all hosts on the Internet were registered with the Network 
Information Center (SRI-NIC) and entries were placed in the Official 
Host Table for each one. Data on the number of hosts for pre-DNS 
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years comes from copies of the host table at selected times. The DNS 
system was introduced around 1984 but took almost 4 years before it 
was fully implemented on the Internet. However, by this time many 
hosts were no longer registered in the Host Table. 


In 1986, the ZONE (Zealot Of Name Edification) program was written. 
ZONE was originally intended to be used during the host-table-to-DNS 
transition period. ZONE would "walk" the DNS tree and build a host 
table of all the information it collected. This host table could 
then be used by sites that had not yet made the DNS transition. 
However, ZONE was never used for this purpose. Instead, it was found 
to be useful for collecting statistics on the size of the domain 
system and the Internet. 


ZONE could not collect complete data on the DNS until around 1988, 
because early versions of BIND (the popular Unix DNS implementation) 
had major problems with the zone transfer function of the DNS 
protocol. ZONE has been used in varying ways ever since to collect 
this information. In the first few years, it was used to produce a 
wall-size chart of the domain tree. However, the number of domains 
quickly outgrew the size of the wall and the charts were abandoned. 
In later years, statistics on the number of hosts and domains were 
extracted from the resulting host table, sometimes categorizing data 
based on top-level domain names or on computer system type or 
manufacturer. 


The time to gather the data also grew from hours to a week, and the 
size of the host table produced soon reached 50 megabytes. In order 
to reduce the amount of data collected, ZONE is now run in a mode 
collecting only host names and IP addresses, ignoring protocol, host 
information and MX record data. The host table is then groveled over 
by some utilities (such as sort, uniq and grep) to produce the 
statistics required. ZONE is currently run every 3 months at SRI. 


How ZONE Works 


ZONE maintains a list of domains and their servers and a flag 
indicating whether information for a domain has been successfully 
loaded from one of the servers. Because of another bug in BIND, ZONE 
must be primed with a list of all the top-level domains and their 
name servers. It then cycles through the domain list, attempting to 
contact one of the servers for each domain not yet transferred. When 
a server is contacted (via TCP), a Start of Authority (SOA) query is 
first sent to make sure the server is authoritative for the domain 
being requested. If so, then a zone transfer query (AXFR) is sent to 
request all the resource records for the domain to be retrieved. 


When a name server record (NS) is received, the referenced domain and 
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server are added to the list of domains to process. When host 
records (A, CNAME, HINFO, MX) are received, they are added to an in- 
core table of host information. The program ends when it has cycled 
through the entire list of domains without receiving any new 
information. It then dumps the table of host information to a 
HOSTS.TXT format file. 


Problems with Data Collection 


For various reasons, some Internet sites do not allow zone transfers 
of their domain servers. ZONE also eventually gives up trying to 
transfer a domain after too many failures. The number of domains 
that could not be zone transferred during the 1-Jan-92 ZONE run was 
around 800 out of 17,000. Additionally, it is assumed that not all 
hosts on the Internet are registered in a domain server. These 
problems cause the statistics gathered by ZONE to be lower than the 
actual amounts. 


Manual review of some of the data collected by ZONE also shows a lot 
of random entries in the DNS. Misformatted entries may cause bogus 
server or host records to appear. Many times a server is found to 
not be authoritative for the domain listed. Sometimes entire domains 
are renamed and their old entries left in place for a transition 
period, thus causing each host within that domain to be counted 
twice. These problems cause the results of ZONE to be higher than 
the actual amounts. 


Manual scanning of the data indicates that the additional entries are 
insignificant compared to the missing entries discussed earlier. 

ZONE data can thus be viewed as the minimum number of Internet hosts, 
and not the actual figures. 


A final problem with data collection is that of expense. Downloading 
domain information from every domain on the Internet generates a 
large amount of network traffic. It also puts an extra CPU load on 
each domain server it must contact. An organized effort might be 
considered to have only one such program doing this on the Internet 
at regularly scheduled intervals to keep the problem of multiple data 
collectors from occurring. 


Scope of the Study 


A problem with counting hosts and domains on the Internet is defining 
what the Internet really is. Finding host entries in the DNS does 
not necessarily indicate that the host is reachable from the 
Internet. Many companies have mail gateways between the Internet and 
their local nets, thus disallowing direct access. However, some of 
these companies advertise all their hosts, and some advertise only 
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the gateway. Are these hosts on the Internet or not? 


Furthermore, many domains in the DNS are just mail-forwarding (MX) 
entries for off-Internet (such as Usenet) sites. Are these domains 
really part of the Internet and should they be counted in an Internet 
size study? 


For the purposes of this study, a host has been defined as a 

[name (s), IP-address (es)] grouping discovered from the DNS. This 
prevents us from counting a host with multiple names or addresses 
more than once. However, this does not consider whether the host is 
directly accessible or not. When ZONE counts the number of domains 
it includes all domains referenced by an NS record in the DNS, thus 
including MX-only domain sites in the final results. 


N. Results 


This section presents data from archive tapes of SRI-NIC from 1981 to 
1986, and statistics gathered by runs of ZONE from 1986 to 1992. 


N.1 Number of Internet Hosts 


The chart below shows the number of IP hosts on the Internet. These 
are hosts with at least one IP address assigned. Data was collected 
by ZONE except where noted. The following two sections are graphs of 
the data in this chart. 


Date Hosts 
08/81 213 Host table #152 
05/82 235 Host table #166 
08/83 562 Host table #300 
10/84 1,024 Host table #392 
10/85 1,961 Host table #485 
02/86 2,308 Host table #515 
11/86 5,089 
12/87 28,174 
07/88 33,000 
10/88 56,000 
01/89 80,000 
07/89 130,000 
10/89 159,000 
10/90 313,000 
01/91 376,000 
07/91 535,000 
10/91 617,000 
01/92 727,000 
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Number of Internet Hosts (linear) 
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This graph is a linear plot of the number of Internet hosts. 


Lottor [Page 5] 


RFC 1296 Internet Growth (1981-1991) January 1992 


Number of Internet Hosts (logarithmic) 


| 1000000 
| x * 
| KK, * 
| 
| 100000 i 
x x 
p a 
fe) | = 
s | 10000 
t | 
sS | = 
| x x 
| vs 
| =i 
E 
100|. 
8 8 8 8 8 8 8 8 8 9 9 E 
1 2 3 4 5 6 7 8 9 y ' 2 
Date 
wee = data point, "." = estimate 


This graph is a logarithmic plot of the number of Internet hosts. 
N.2 Number of Domains 


This chart shows the number of domains existing in the Internet 
Domain Name System as collected by ZONE. 


Date Domains 
07/88 900 
10/88 1,280 
01/89 2,600 
07/89 3,900 
10/89 4,800 
10/90 9,300 
01/91 11,200 
07/91 16,000 
10/91 18,000 
01/92 17,000 
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N.3 Distribution of IP Addresses per Host 


This chart shows how many hosts have how many IP addresses. This 
data was collected on 1-Jan-92 and only the first 10 entries are 
shown. 
Addresses Hosts 
1 715143 
2 9015 
3 1027 
4 556 
5 314 
6 213 
7 100 
8 85 
9 58 
10 71 


N.4 Distribution of Hosts by Top-level Domain 


This chart shows the number of hosts per top-level domain (top 40 
only) on 1-Jan-92. The percentage listed is the increase since 1- 
Oct-91. Large variations are probably due to problems and variations 
in the collection process; these figures are not meant to be 
authoritative, but serve as reasonable estimates. 


243020 edu 13% 13011 fr 4% 1791 dk 4% 357 be -5% 
181361 com 12% 12770 nl 21% 1662 es 15% 334 gr 14% 
46463 gov 13% 12647 ch 10% 1506 kr 9% 308 br 26% 
31622 au 19% 11994 fi 15% 1111 nz -16% 284 mx -5% 
31016 de 20% 10228 no 9% 1016 tw n/a 207 is % 
27492 mil 26% 8579 jp 6% 929 za n/a 146 pl 97% 
27052 ca 22% 4109 net -49% 784 pt n/a 127 us 25% 
19117 org 10% 3324 at 19% 484 sg 251% 25: -En 0% 
18984 uk 139% 2719 it 197% 448 hk 78% 24 hu 71% 
18473 se 34% 2020 il 14% 374 ie -7% 6 arpa 0% 
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Distribution of Hosts by Host Name 


This chart shows the distribution of hosts by their host name on 1- 
Jan-92. The host name is defined to be the first part of a fully 
qualified domain name. Only the top 100 names are shown. 


venus 204 mac4 172 mac9 155 pollux 138 chaos 
pluto 201 hobbes 172 maclil 155 frodo 136 bart 
mars 201 hermes 170 mac8 153 helios 135 pc5 
jupiter 198 thor 169 phoenix 152 macl7 135 larry 
saturn 198 sirius 169 mac12 151 vega 135 cs 
pel 196 gw 169 hal 151 macl18 133 odin 
zeus 195 calvin 168 snoopy 150 falcon 131 tiger 
iris 194 mac5 168 macl3 150 bach 131 sparky 
mercury 191 macl0 167 macl5 146 castor 131 ariel 
macl 190 fred 167 macl4 145 sol 130 sneezy 
orion 189 titan 167 grumpy 145 dopey 128 mac 
mac2 189 pc3 163 gandalf 144 mac20 127 sunl 
newton 186 opus 162 pc4 144 macl19 127 rocky 
neptune 186 mac6 160 uranus 142 spock 126 pc6 
pc2 185 charon 159 macl6 142 euler 125 hydra 
gauss 185 apollo 158 sleepy 141 mickey 125 homer 
eagle 179 mac7 158 io 141 atlas 124 isis 
mac3 179 athena 157 earth 140 maxwell 123 moe 
merlin 177 alpha 156 europa 140 happy 123 delta 
cisco 172 mozart 155 rigel 140 doc 122 pc10 
ure Issues 


ZONE currently runs on a DECsystem-20 and is written in assembler. 
The amount of data is quickly reaching the limits of the DEC-20 
section address space, and the hardware’s ability to survive gets 
slimmer each day. ZONE assembles all its data in core before dumping 
it to disk. The implementation does this in order to be able to 
match host nicknames with official names before dumping complete host 
records. Sometimes a nickname can be in a different domain than the 
official name, complicating simpler methods. 


A new version of ZONE needs to be written to run on a modern computer 
system. A completely new architecture should be designed to handle 
the enormous amount of data collected and expected in the future. 
Data should be kept on disk so that a system crash will not wipe out 
days of collection. Multiple zone transfers could be occurring in 
parallel to reduce the time needed for data gathering. A new ZONE 
might run continuously, cycling through the domain system on a cycle 
lasting weeks to a month, updating a local database with statistics 
collected for each domain. In this way, current statistics on the 
size of the Internet would always be known. The resulting database 
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may also be useful for other network information services. 
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Security Considerations 

Security issues are not discussed in this memo. 
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